home *** CD-ROM | disk | FTP | other *** search
Wrap
ggggrrrriiiioooo((((5555)))) ggggrrrriiiioooo((((5555)))) NNNNAAAAMMMMEEEE grio - guaranteed-rate I/O DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN Guaranteed-rate I/O (GRIO) refers to a guarantee made by the system to a user process indicating that the given process will receive data from a system resource at a predefined rate regardless of any other activity on the system. The purpose of this mechanism is to manage the sharing of scarce I/O resources amongst a number of competing processes, and to permit a given process to reserve a portion of the system's resources for its exclusive use for a period of time. Currently, the only system resources that can be reserved using the GRIO mechanism are files stored on an XFS filesystem. A GRIO guarantee is defined as the number of bytes that can be read or written to a given file by a given process over a set period of time. If a process has a GRIO guarantee on a file, it can write data to or read data from the file at the guaranteed rate regardless of other I/O activity on the system. If the process issues I/O requests at a size or rate greater than the guarantee, the behavior of the system is determined by the type of rate guarantee. The excess requests may be blocked until such time as they fall within the scope of the guarantee, or the requests may be allowed up to the limit of the device bandwidth before that are blocked. The following types of rate guarantees are supported: _PPPP_EEEE_RRRR______FFFF_IIII_LLLL_EEEE______GGGG_UUUU_AAAA_RRRR - the GRIO reservation is associated with a single file and may not be transferred _PPPP_EEEE_RRRR______FFFF_IIII_LLLL_EEEE______SSSS_YYYY_SSSS______GGGG_UUUU_AAAA_RRRR - the GRIO reservation may be transferred to any file on a given file system _PPPP_RRRR_OOOO_CCCC______PPPP_RRRR_IIII_VVVV_AAAA_TTTT_EEEE______GGGG_UUUU_AAAA_RRRR - the GRIO reservation may not transferred to another process _PPPP_RRRR_OOOO_CCCC______SSSS_HHHH_AAAA_RRRR_EEEE______GGGG_UUUU_AAAA_RRRR - the GRIO reservation may be transferred to another process _FFFF_IIII_XXXX_EEEE_DDDD______RRRR_OOOO_TTTT_OOOO_RRRR______GGGG_UUUU_AAAA_RRRR - the GRIO reservation is the VOD (ie. ROTOR) type of reservation and the rotor position is established at the start of the reservation _SSSS_LLLL_IIII_PPPP______RRRR_OOOO_TTTT_OOOO_RRRR______GGGG_UUUU_AAAA_RRRR - the GRIO reservation is the VOD (ie. ROTOR) type of reservation and the rotor position will vary according to the access pattern of the process _NNNN_OOOO_NNNN______RRRR_OOOO_TTTT_OOOO_RRRR______GGGG_UUUU_AAAA_RRRR - the GRIO reservation is a regular (ie. NONROTOR) type of reservation _RRRR_EEEE_AAAA_LLLL_TTTT_IIII_MMMM_EEEE______SSSS_CCCC_HHHH_EEEE_DDDD______GGGG_UUUU_AAAA_RRRR - the GRIO reservation specifies the rate at which data will be provided to the process PPPPaaaaggggeeee 1111 ggggrrrriiiioooo((((5555)))) ggggrrrriiiioooo((((5555)))) _NNNN_OOOO_NNNN______SSSS_CCCC_HHHH_EEEE_DDDD______GGGG_UUUU_AAAA_RRRR - the I/O requests associated with the GRIO reservation are non-scheduled, this will affect other GRIO reservations on the system Only one GRIO reservation characteristic may be chosen from each group. The PPPPRRRROOOOCCCC____SSSSHHHHAAAARRRREEEE____GGGGUUUUAAAARRRR, non RRRREEEEAAAALLLLTTTTIIIIMMMMEEEE____SSSSCCCCHHHHEEEEDDDD____GGGGUUUUAAAARRRR, and NNNNOOOONNNN____RRRROOOOTTTTOOOORRRR____GGGGUUUUAAAARRRR characteristics are set by default. There are a number of components in the GRIO mechanism. The first is the guarantee-granting daemon, _g_g_d. This is a user level process that is started when the system is booted. It controls the granting of guarantees, the initiation and expiration of existing guarantees, and the monitoring of the available bandwidths of each I/O device on the system. User processes communicate with the daemon via the grio library using the following calls: _gggg_rrrr_iiii_oooo______aaaa_ssss_ssss_oooo_cccc_iiii_aaaa_tttt_eeee______ffff_iiii_llll_eeee_((((_)))) - associate a file with a guarantee _gggg_rrrr_iiii_oooo______qqqq_uuuu_eeee_rrrr_yyyy______ffff_ssss_((((_)))) - query filesystem _gggg_rrrr_iiii_oooo______aaaa_cccc_tttt_iiii_oooo_nnnn______llll_iiii_ssss_tttt_((((_)))) - issue list of GRIO reservation requests _gggg_rrrr_iiii_oooo______rrrr_eeee_ssss_eeee_rrrr_vvvv_eeee______ffff_iiii_llll_eeee_((((_)))) - issue GRIO reservation request _gggg_rrrr_iiii_oooo______rrrr_eeee_ssss_eeee_rrrr_vvvv_eeee______ffff_ssss_((((_)))) - issue GRIO reservation request _gggg_rrrr_iiii_oooo______uuuu_nnnn_rrrr_eeee_ssss_eeee_rrrr_vvvv_eeee______bbbb_wwww_((((_)))) - remove grio reservation When _g_g_d is started, it reads the file /_e_t_c/_g_r_i_o__d_i_s_k_s and uses its inbuilt system knowledge to determine the bandwidths of the various devices on the system. The /_e_t_c/_g_r_i_o__d_i_s_k_s file may be edited by the system administrator to tune performance. If _g_g_d is terminated, all existing rate guarantees are removed. The next component of the GRIO mechanism is the XLV volume manager. Rate guarantees may be obtained from files on the real-time and non-realtime subvolumes of an XFS filesystem as well as non-XLV disk partitions having XFS filesystems. The disk driver command retry mechanism is disabled on the disks that make up the real-time subvolume. This means that if a drive error occurs, the data is lost. The intent of real-time files is to read/write data from the disk as rapidly as possible. If the device driver is forced to retry one process's disk request, it causes the requests from other processes to become delayed. If one partition of a disk is used in a real-time subvolume, the entire disk is considered to be used for real-time operation. If one disk on a SCSI controller is used for real-time operation then all the other devices on that controller must be used for real-time operation as well. In order to use the guaranteed-rate I/O mechanism effectively, the XLV volume and XFS filesystem must be set up properly. The next section gives an example. PPPPaaaaggggeeee 2222 ggggrrrriiiioooo((((5555)))) ggggrrrriiiioooo((((5555)))) By default, the _g_g_d daemon will allow four process streams to obtain rate guarantees. If support for more streams is desired, it is necessary to obtain licenses for the additional streams. The license information is stored in the /_u_s_r/_v_a_r/_n_e_t_l_s/_n_o_d_e_l_o_c_k file and interpreted by the _g_g_d daemon on startup. While configuring a system that will be used for guaranteed rate I/O, it is important to recognize that the system setup can affect grio. For example, if non standard disk drives are added and the _g_r_i_o__d_i_s_k_s file updated, this might have an impact on the performance of the SCSI controllers (which should then be tuned thru _g_r_i_o__d_i_s_k_s). Also, it is recommended that the real time disks to which grio is needed should be placed all by themselves on the SCSI bus. This is becauses devices like CDROMs and tapes might hold up the bus if accessed while a grio request is being serviced. Grio does not model file system meta data writes, so keeping the data and log volumes of the XLV on different SCSI busses from the real time disks help in satisfying the guarantees. For real time disks, it is strongly recommended that the disks be partitioned to have only one partition, the real time partition. Finally, some system components are used for I/O as well as other data traffic, so it is important to not overload these components with other data requests while grio requests are being serviced. EEEEXXXXAAAAMMMMPPPPLLLLEEEE The example in this section describes a method of laying out the disks, filesystem, and real-time file that enables the greatest number of processes to obtain guarantees on a single file concurrently. It is not necessary to construct a file in this manner in order to use GRIO, however fewer processes can obtain rate guarantees on the file as a result. It is also not necessary to use a real-time file, however guarantees obtained on non-real time files can only be considered to be "soft" guarantees at best which may not be sufficient for some applications. Assume that there are four disk partitions available for the real-time subvolume of an XLV volume. Each one of the partitions is on a different physical disk. Before setting up the XFS filesystem, the I/O request size used by the user process must be determined. In order to get the greatest I/O rate, the file data should be striped across all the disks in the subvolume. To avoid filesystem fragmentation and to force all I/O operations to be on stripe boundaries, the file extent size should be an even multiple of the volume stripe width. Rate guarantees should be made with sizes equal to even multiples of I/O request sizes. All I/O request sizes must be even multiples of the optimal I/O size of the underlying disk devices. The optimal I/O size is specified on a per device basis in the /_e_t_c/_g_r_i_o__d_i_s_k_s. The disk device characteristics for optimal I/O sizes of 64k, 128k, 256k, and 512k bytes are supplied. The _g_r_i_o__b_a_n_d_w_i_d_t_h(_1_M) utility can be used to determine the device characteristics for different optimal I/O sizes. For simplicity, this example will use an optimal I/O size of 64K bytes. Also, the stripe size of the XLV realtime subvolume for this file system will be set to an even multiple of 64K bytes. If there are four disks available, let the stripe step size be equal to 64k PPPPaaaaggggeeee 3333 ggggrrrriiiioooo((((5555)))) ggggrrrriiiioooo((((5555)))) bytes, and the volume stripe width be 256k bytes. The file extent size should be set to a multiple of the volume stripe width. In this example, let the file extent size be equal to the stripe width. Assume that the application always issues I/O requests in sizes equal to the extent size. Once the XLV volume and XFS filesystem have been created, the application can create the real-time file. Real-time files must be read or written using direct, synchronous I/O requests. (This is also true for GRIO accesses to non-real time files.) The _o_p_e_n(2) manual page describes the use and buffer alignment restrictions when using direct I/O. When creating a real-time file, the FFFF____FFFFSSSSSSSSEEEETTTTXXXXAAAATTTTTTTTRRRR command must be issued to set the XXXXFFFFSSSS____XXXXFFFFLLLLAAAAGGGG____RRRREEEEAAAALLLLTTTTIIIIMMMMEEEE flag. This can only be issued on a newly created file. It is not possible to mark a file as real-time once non-real-time data blocks have been allocated to it. After the real-time file has been created, the application can issue _g_r_i_o__r_e_s_e_r_v_e__f_s(_3_X)] and _g_r_i_o__a_s_s_o_c_i_a_t_e__f_i_l_e(_3_x) pair, to obtain the rate guarantee. Once the rate guarantee is established, any read or write requests that the application issues to the file will be completed within the parameters of the guarantee. This will continue until the file is closed, the guarantee is removed by the application via _g_r_i_o__u_n_r_e_s_e_r_v_e__b_w(3X), or the guarantee expires. Any process can use the _g_r_i_o__a_s_s_o_c_i_a_t_e__f_i_l_e() call to switch the GRIO reservation to itself if the PPPPRRRROOOOCCCC____SSSSHHHHAAAARRRREEEE____GGGGUUUUAAAARRRR characteristic is set. This causes the first process to lose the rate guarantee and the second process to receive it. Similarly, the _g_r_i_o__a_s_s_o_c_i_a_t_e__f_i_l_e() call can be used to switch the GRIO reservation from one file to another, within the same filesystem, if the PPPPEEEERRRR____FFFFIIIILLLLEEEE____SSSSYYYYSSSS____GGGGUUUUAAAARRRR characteristic is set. DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS If a rate cannot be guaranteed, _g_g_d returns an error to the requesting process. It also returns the amount of bandwidth currently available on the device. The process can then determine if this amount is sufficient and if so issue another rate guarantee request. FFFFIIIILLLLEEEESSSS /etc/grio_disks /usr/var/netls/nodelock SSSSEEEEEEEE AAAALLLLSSSSOOOO ggd(1M), grio(1M), grio_bandwidth(1M), grio_associate_file(3X), grio_query_fs(3X), grio_action_list(3X), grio_reserve_file(3X), grio_reserve_fs(3X), grio_unreserve_bw(3X), grio_disks(4) NNNNOOOOTTTTEEEESSSS To make grio more secure, processes requesting guaranteed rate I/O need the priviledge of CAP_DEVICE_MGMT or root permissions, else their requests will fail. PPPPaaaaggggeeee 4444